12,104 research outputs found
Joint Training for Neural Machine Translation Models with Monolingual Data
Monolingual data have been demonstrated to be helpful in improving
translation quality of both statistical machine translation (SMT) systems and
neural machine translation (NMT) systems, especially in resource-poor or domain
adaptation tasks where parallel data are not rich enough. In this paper, we
propose a novel approach to better leveraging monolingual data for neural
machine translation by jointly learning source-to-target and target-to-source
NMT models for a language pair with a joint EM optimization method. The
training process starts with two initial NMT models pre-trained on parallel
data for each direction, and these two models are iteratively updated by
incrementally decreasing translation losses on training data. In each iteration
step, both NMT models are first used to translate monolingual data from one
language to the other, forming pseudo-training data of the other NMT model.
Then two new NMT models are learnt from parallel data together with the pseudo
training data. Both NMT models are expected to be improved and better
pseudo-training data can be generated in next step. Experiment results on
Chinese-English and English-German translation tasks show that our approach can
simultaneously improve translation quality of source-to-target and
target-to-source models, significantly outperforming strong baseline systems
which are enhanced with monolingual data for model training including
back-translation.Comment: Accepted by AAAI 201
On the Identifiability and Interpretability of Gaussian Process Models
In this paper, we critically examine the prevalent practice of using additive
mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and
explore the properties of multiplicative mixtures of Mat\'ern kernels for
multi-output GP models. For the single-output case, we derive a series of
theoretical results showing that the smoothness of a mixture of Mat\'ern
kernels is determined by the least smooth component and that a GP with such a
kernel is effectively equivalent to the least smooth kernel component.
Furthermore, we demonstrate that none of the mixing weights or parameters
within individual kernel components are identifiable. We then turn our
attention to multi-output GP models and analyze the identifiability of the
covariance matrix in the multiplicative kernel , where
is a standard single output kernel such as Mat\'ern. We show that is
identifiable up to a multiplicative constant, suggesting that multiplicative
mixtures are well suited for multi-output tasks. Our findings are supported by
extensive simulations and real applications for both single- and multi-output
settings. This work provides insight into kernel selection and interpretation
for GP models, emphasizing the importance of choosing appropriate kernel
structures for different tasks.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
- …